Distributed Data Clustering

نویسنده

Abdelhamid Bouchachia

چکیده

To make effective use of distributed information, it is desirable to allow coordination and collaboration among various information sources. This paper deals with clustering data emanating from different sites. The process of clustering consists of three steps: find the (local) clusters of data at each site; find (higher) clusters from the union of the distributed data sets at the central site; and finally compute the associations between the two sets of clusters. The approach aims at discovering the hidden structure of a multi-source data and assigning unseen data points coming from a site to the right higher cluster without any need to access their feature values. The proposed approach is evaluated experimentally.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

DISTRIBUTED AND COLLABORATIVE FUZZY MODELING

In this study, we introduce and study a concept of distributed fuzzymodeling. Fuzzy modeling encountered so far is predominantly of a centralizednature by being focused on the use of a single data set. In contrast to this style ofmodeling, the proposed paradigm of distributed and collaborative modeling isconcerned with distributed models which are constructed in a highly collaborativefashion. I...

متن کامل

Improving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering

Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...

متن کامل

Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis

Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...

متن کامل

Distributed Balanced Clustering via Mapping Coresets

Large-scale clustering of data points in metric spaces is an important problem in mining big data sets. For many applications, we face explicit or implicit size constraints for each cluster which leads to the problem of clustering under capacity constraints or the “balanced clustering” problem. Although the balanced clustering problem has been widely studied, developing a theoretically sound di...

متن کامل

A Distributed and Parallel Clustering Algorithm for Massive Biological Data

Distributed processing today is a largely advantageous technology of bridging together a system of multiple computers and processor systems in running applications. The concept of Distributed processing has allowed time cutting and therefore reduction in costs. Using this, we aim to address clustering techniques in developing new method for further reduction in time and costs. The problem of cl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Distributed Data Clustering

نویسنده

چکیده

منابع مشابه

Entropy-based Consensus for Distributed Data Clustering

DISTRIBUTED AND COLLABORATIVE FUZZY MODELING

Improving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering

Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis

Distributed Balanced Clustering via Mapping Coresets

A Distributed and Parallel Clustering Algorithm for Massive Biological Data

عنوان ژورنال:

اشتراک گذاری